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Abstract 

While most of the speech and natural language 
systems which were developed for English and 
other Indo-European languages neglect the mor- 
phological processing and integrate speech and 
natural language at the word level, for the agglu- 
tinative languages such as Korean and Japanese, 
the morphological processing plays a major role in 
the language processing since these languages have 
very complex morphological phenomena and rel- 
atively simple syntactic functionality. Obviously 
degenerated morphological processing limits the 
usable vocabulary size for the system and word- 
level dictionary results in exponential explosion in 
the number of dictionary entries. For the aggluti- 
native languages, we need sub-word level integra- 
tion which leaves rooms for general morphological 
processing. 

In this paper, we developed a phoneme-level inte- 
gration model of speech and linguistic processings 
through general morphological analysis for agglu- 
tinative languages and a efficient parsing scheme 
for that integration. 

Korean is modeled lexically based on the categorial 
grammar formalism with unordered argument and 
suppressed category extensions, and chart-driven 
connectionist parsing method is introduced. 

1 Introduction 

Spoken language processing challenges for integra- 
tion of speech recognition into natural language 
processing, and must deal with multi-level knowl- 
edge sources from signal level to symbol level. 
The multi-level knowledge integration and han- 
dling increase the technical difficulty of both the 
speech and the natural language processing. In 
the speech recognition side, the recognition must 
be at phoneme-level for large vocabulary contin- 
uous speech, and the speech recognition module 
must provide right level of outputs to the natural 
language module in the form of not single solution 
but many alternatives of solution hypotheses. The 



n-best list (Chow & Schwartz 1989), word-graph 
(Oerder & Ney 1993), and word-lattice (Murveit et 
al. 1993) techniques are mostly used in this pur- 
pose. The speech recognition module can also ask 
the linguistic scores from the language processing 
module in a more tightly coupled bottom-up/top- 
down hybrid integration scheme (Paul 1989). In 
the natural language side, the insertion, deletion, 
and substitution errors of continuous speech must 
be compensated by robust parsing and partial pars- 
ing techniques, e.g. (Baggia & Rullent 1993). Of- 
ten the spoken languages are ungrammatical, frag- 
mentary, and contain non-fluencies and speech re- 
pairs, and must be processed incrementally under 
the time constraints (Menzel 1994). 

Most of the speech and natural language systems 
which were developed for English and other Indo- 
European languages neglect the morphological pro- 
cessing, and integrate speech and natural language 
at the word level (Bates et al. 1993; Agnas et al. 
1994). Often these systems employ a pronunciation 
dictionary for speech recognition and independent 
dictionaries for natural language processing. How- 
ever, for the agglutinative languages such as Korean 
and Japanese, the morphological processing plays 
a major role in the language processing since these 
languages have very complex morphological phe- 
nomena and relatively simple syntactic functional- 
ity. Unfortunately even the Japanese researchers 
apply degenerated morphological techniques for the 
spoken Japanese processing (Hanazawa et al. 1990; 
Sawai 1991). Obviously degenerated morphological 
processing limits the usable vocabulary size of the 
system, and word-level dictionary results in expo- 
nential explosion in the number of dictionary en- 
tries. For the agglutinative languages, we need 
sub-word level integration which leaves rooms for 
general morphological processing. 

In this paper, we propose a parsing scheme for 
spoken Korean, called parsing, which is inte- 
grated with the speech recognition in phoneme- 



level. 

2 Overall parser architecture 

We call our parsing scheme as parsing which 
stands for chart-driven connectionist categorial 
parsing. Korean is modeled by an extended catego- 
rial grammar and parsed by a connectionist method 
which is controlled by the charts. 

3 Extension of Categorial Grammar 

3.1 Directional categorial grammar 

The directional categorial grammar naturally 
represents some Korean morphemes, such as 
postpositions(noun-endings and verb-endings), ad- 
verbs, and pre-nouns. A directional categorial 
grammar(Uszkoreit 1986; Zeevat 1988) is defined as 
an ordered quintuple G = <V, C, S, R, f>, where 

• V: the vocabulary set, 

• C: a finite set of basic categories which generates 
a full set C of categories by the recursive appli- 
cation of the following category formation rules: 

if a G C, then a G C and 
if a e C and b G C, then a/b G C and a\b 
e C, 

• S: the category for sentences, 

• R: a set of functional application rules such as 
left cancelation 

a b\a => h 
right cancelation 
b/a a =J> b 

• f: an assignment function of elements of V to 
subsets of C. 

Here comes a few example assignments: 
say (new) np/np 
pha-il(file) np 
tul(-s) np\np 
ul(objective case marker) np[obj]\np. 

These assignments can be used to build a np[obj] 
structure for the eojeol "pha-il-tul-ul" as shown in 
figure 1. 

3.2 Unordered arguments extension 

Korean has relaticely free word order compared to 
SOV languages (Lee, Lee, & Lee 1994). For ex- 
ample, the sentence "ku-ka sa-kwa-lul mek-nun-ta" 
(which means "he eats an apple") can be written 
as "sa-kwa-lul ku-ka mek-nun-ta". This word-order 
variations can't be modeled by the pure directional 
CG and We extend the category formation and 
functional application rules to deal with this phe- 
nomena: 



nptobj] 




np np\np np[obj]\np 
pha-il till 111 

Figure 1: "pha-il-tul-ul" parsed by the directional 
CG. 

• if a e C, then a G C' 

• if a e C', and S C C', then a/S G C' and a\S G 
C' 

• left cancellation 

ai b\{ai,. . .,a„} =^ b\{ai,. . . ,ai_i,ai+i,. . .,a„} 

• right cancellation 

b/{ai,. . .,a„} ai =^ b/{ai,. . . ,ai_i,ai+i,. . .,a„} 

The sentences "ku-ka sa-kwa-lul mek-ess-ta" and 
"sa-kwa-lul ku-ka mek-ess-ta" can be parsed by the 
following category assignments (in figure 2): 

ku(he) np 

ka(subj. case-marker) np[subj]\np 

sa-kwa(apple) np 

lul(obj. case-marker) np[obj]\np 

mek(eat) s\-[np[subj],np[obj]} 

nun-ta(declaritive modal) s[tDEC]\[s\$X) 



3.3 Suppressed categories extension 

After a few experiments with the directional CG 
with unordered argument extension, we found so 
much structural ambiguities resulted even from 
simple Korean sentences. For example, "say pha- 
il-tul"(new files) is parsed in two ways(figure 3). 

These abmiguities are caused by the fact that 
the categorial assignments we used to model Ko- 
rean were too simple and had no order between 
rule applications. Here, we suggest a way to inhibit 
figure 3 (a) through suppressed and activator cate- 
gories. A suppressed category is a category with | 
instead of / or \, which can be changed to a cate- 
gory with / or \ by an activator category. An acti- 
vator category is a category with a suppressed cat- 
egory as its argument and an ordinary category as 
its result. Following categorial assignments shows 
suppressed and activator categories: 

say (new) np/np 

pha-il(file) np| 

tul(plural suffix) np\(np|) 

ul(objective case marker) np[obj]\np. 



s[tDEC] 




np[obj] s\{np[subj], s[tDEC]\ 



\np 

sa-kwa lul 

(a) 



np[obj]} (s\$X) 
mek nun-ta 



s[tDEC] 




np[subj] s\{np[subj], s[tDEC]\ 



\np 
sa-kwa lul 



ku 



\np 
ka 

(b) 



np[obj]} (s\$X) 
mek nun-ta 



Figure 2: Word-order independant parse 





np/np np np\np 
say pha-il tul 

(a) 



np/np 
say 



np npVnp 
pha-il tul 

(b) 



Figure 3: Two alternative parses for "say pha-il- 
tul", generated by CG 




np/np npl np\(npl) np[obj]\ 
np 

say pha-il tul ul 

Figure 4: Correct parses for "say pha-il-tul" gen- 
erated by CG with unordered arguments, and sup- 
pressed and activator categories. 



The np| is a suppressed category and prevented 
from being combined with the "say" and forces the 
noun and the suffix to combine first. The np\(np|) 
is an activator category whose argument is a sup- 
pressed category and result is an ordinary cate- 
gory. Whether a morpheme gets an ordinary, a 
suppressed, or an activator category is determined 
by the morphological conditions. A noun followed 
by a suffix gets a suppressed category and a suffix 
followed by a noun-ending gets an activator cate- 
gory. This assignment is handled by the morpholog- 
ical classification information and morpheme con- 
nectivity matrix. 

Figure 4 shows the parse for the "say pha-il-tul" 
using suppressed and activator categories. 

These two classes of categories lexically models 
Korean noun structures and predicate structures 
and can be used for other agglunative languages. 

4 parsing 

The syntax analysis is performed by interactive re- 
laxation (spreading activation) parsing on the cate- 
gorial grammar where the position of the functional 
applications are controlled by a triangular table. 
The original interactive relaxation parsing (Howells 
1988) was extended to provide efficient constituent 
searching and expectation generation through po- 
sitional information provided by categorical gram- 
mar and triangular table (Lee & Lee 1995 in press) 
and to parse the input morpheme lattice at once. 
The interactive relaxation process consists of the 
following three steps that are repetitively executed: 
1) add nodes, 2) spread activation, and 3) decay. 

• add nodes 

Grammar nodes (syntactic categories from 
the dictionary) are added for each sense of the 
morphemes when the parsing begins. A gram- 
mar node which has more activation than the 
predefined threshold generates new nodes in 



the proper positions (to be discussed shortly). 
The newly generated nodes search for the con- 
stituents (expectations) which are in the appro- 
priate table positions, and are of proper function 
applicable categories. 

• spread activation 

The bottom-up spreading activation is as 
follows: 



n X p X a X 



En 
.7=1 «: 



.2 



where predefined portion p of total activation a 
is passed upward to the node with activation 
among the n parents each with node activation 
aj. In other words, the node with large activation 
gets more and more activation, and it gives an 
inhibition effects without explicit inhibitory links 
(Reggia 1987). 

The top-down spreading activation uni- 
formly distributes: 

p X a 

among the children where p' is predefined portion 
of the source activation a. 

• decay 

The node's activation is decayed with time. 
The node with less constituents than needed gets 
penalties plus decays: 



a X (1 



where a is an activation value, d is a decay ra- 
tio, and Ca, Cr is the actual and required con- 
stituents. After the decay, the node with less 
activation than the predefined threshold $ is re- 
moved from the table. 

The node generation and constituent search posi- 
tions are controlled by the triangular table. When 
the node a(i,j) acts as an argument, it generates 
node only in the position (k,j) where 1 < k < j, and 
the generated node searches for the constituents 
(functors) only in the position (k,i-l). Or when 
the node is generated in the position (i,k) where 
j < k < number — of — morphemes, it searches 
for the position (j + l,k) for its constituents. When 
the node acts as a functor, the same position re- 
strictions also apply for the node generation and 
the argument searching. The position control com- 
bined with the interactive relaxation guarantees an 
efficient, lexically oriented, and robust syntax anal- 
ysis of spoken languages. 

5 Spoken Korean morphological 
analysis 
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Figure 5: The TDNN-based speech recognizer 

Figure 5 shows the architecture of our TDNN-based 
speech recognizer. The TDNN(time-delayed neu- 
ral network)-based phoneme recognizer gvies a se- 
quence of phonemes for the input speech and this 
phoneme sequence is decoded by the viterbi lexi- 
cal decoder. Tree-structured phoneme-sequence-to- 
morpheme dictionary is used in the lexical decoding 
phase and a morpheme lattice is extracted. This 
lattice is filtered by the pairwise language model. 
The language model checks each adjacent pair of 
morphemes in the lattice whether they are con- 
nectable morphologically and phonologically. Then 
the remaining morpheme-lattice is given to the 
parser. 

6 Experimental results 

Through this section, we focus on testing the effect 
of extensions on the categorial grammar and the 
reliability of the parser. We did 4 simulated speech 
recognition experiments using extended categorial 
grammars: 

• Unordered argument base-line(UAB) 

• Unordered argument phoneme lattice(UAP) 

• Unordered argument + suppressed category 
base-line(UA+SCB) 

• Unordered argument + suppressed category 
phoneme lattice(UA+SCP) 

Base-line experiments are to see the complexity 
of the corpus and the modeling ability of the ex- 
tended grammar (figure 6). Single correct phoneme 
sequence for each sentence is given to the parser. 

Phoneme-lattice experiments are to see the pars- 
ing and selecting ability of the parser when more 
than one candidates are given for each phoneme 
which increase the ambiguity (figure 7). 10 
phoneme lattices are generated for each sentence. 
For each correct phoneme, about 2.2 candidates are 





Morphological analysis 
(morpheme lattice) 


Syntactic analysis 
(best 1 parse) 


UAB 


100 % (33/33) 


60.6 % (20/33) 


UA+SCB 


100 % (33/33) 


78.8 % (26/33) 



Figure 6: The morphological and syntactic analysis 
results for correct phoneme sequences. 





Morphological analysis 
(morpheme lattice) 


Syntactic analysis 
(best 1 parse) 


UAP 


100 % (330/330) 


35.5% (117/330) 


UA+SCP 


100 % (330/330) 


61.8 % (240/330) 



Figure 7: The morphological and syntactic analysis 
results for the artificially made phoneme lattices. 



generated along the confusion matrix between Ko- 
rean phonemes. Each phoneme lattice was made to 
contain at least one correct recognition result, so 
the phoneme recognition performance is assumed 
to be perfect in the lattice form. 

Totally 33 sentences from the UNIX natural lan- 
guage interface (Lee & Lee 1995 in press) are used 
in the test. The sentences (translated in english) 
includes : 

• Show me the users who are logging in this ma- 
chine. 

• Show me the users who are in idle more than an 
hour. 

• Send them a mail. 

• Let me know how much disk space "bdragon" 
uses. 

• Change directory to the home directory of 
"bdragon". 

We built two morpheme-level phonetic dictio- 
naries for about 1000 morphemes, one with the 
unordered-argument categorial information and the 
other with the unordered-argument plus suppressed 
categorial information. For the syntactic level in- 
teractive relaxation, we used the following param- 
eters (which are experimentally determined): up- 
ward propagation portion p 0.05, downward propa- 
gation portion p' 0.03, decay ratio d 0.87, the node 
generation threshold 0.51, and the node removal 
threshold $ 0.066. Using the activation level of 
each parse tree, one best parse is selected. 

The morphological analysis was perfect in all 
4 experiments as shown in the tables. Since the 
phoneme lattice was made to contain at least one 
correct phoneme recognition result, the morpholog- 
ical analysis must be perfect as long as the mor- 
pheme is enrolled in the dictionary and the con- 
nectivity information can cover all the morpheme 
combinations. This was possible due to the small 
number of tested sentences (33 sentences). This re- 
sults verify that most of the morphological analysis 



errors from real speech input are actually propa- 
gated from the phoneme recognition errors as dis- 
cussed before. 

UAB compared to UA+SCB shows that the lack 
of noun or predicate structure modeling results in 
extra structural ambiguities and this differentiated 
the UAP and UA+SCP more. In UA+SCB we can 
see an improved performance due to the noun or 
predicate structure modeling through suppressed 
categories. 

Through UAP and UA+SCP we faced 20 to 30 
percent performance down due to the multiple can- 
didates in the phoneme lattices that results in more 
ambiguous morpheme lattices, and finally ambigu- 
ious syntactic trees. These failures can be reduced 
if we generate n-best parse trees, and let the seman- 
tic processing module select the correct ones as is 
usually done in most of the probabilistic parsing 
schemes(Charniak 1994). 

7 Concluding remarks 

We developed a phoneme-level integration model of 
speech and linguistic processings through general 
morphological analysis for agglutinative languages 
and a efficient parsing scheme for that integration. 
Through unordered argument and suppressed cate- 
gory extensions, we modeled Korean lexically based 
on categorial grammar formalism and the justifica- 
tions for that extensions are shown by experiments. 
The chart-driven interactive relaxation parsing and 
extended categorial grammar showed robust han- 
dling of word-order variations and complex word 
structures in Korean. The performance of the sys- 
tem (78.8% and 61.8%) can be improved if we use 
n-best strategy coupled with the semantic process- 
ing. 

We think the system can be extended to other ag- 
glutinative languages such as Japanese, Finish, and 
Turkish, and the languages that have complex mor- 
phological phenomena such as German and Dutch, 
since phonological and orthographic rules, the word 
order and the word structures are modeled declar- 
atively only through the dictionary. 
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